Latency vs Throughput

📌 Latency

Definition: The time taken to process a single request or operation, from initiation to completion.

Unit: Measured in milliseconds (ms) or seconds
Analogy: The time it takes a single car to travel from point A to point B

Examples in System Design

Time taken for an API call to return a response
Time for a database to fetch one record

👉 Low latency means fast response times. It's critical for user-facing systems like search engines, online games, trading systems, etc.

📌 Throughput

Definition: The number of requests a system can handle per unit of time.

Unit: Measured in requests per second (RPS), queries per second (QPS), or transactions per second (TPS)
Analogy: The number of cars passing through a highway per minute

Examples in System Design

A web server handling 10,000 requests/sec
Kafka processing 1 million messages/sec

👉 High throughput means the system can handle more load.

⚖️ Relationship Between Latency and Throughput

They are related but not the same:

Different Combinations

Low latency but low throughput: A system that responds quickly but can't handle many users at once
High throughput but high latency: A batch processing system that processes thousands of records at once but takes minutes per request

Trade-offs

Improving one often comes at the cost of the other:

Adding parallelism can improve throughput but may increase latency due to coordination overhead
Optimizing for low latency (e.g., caching, indexing) may reduce maximum throughput

Metric	Description
Latency	Time taken to show nearby drivers after you open the app
Throughput	Number of ride requests the system can handle globally per second

🛠 Optimization Strategies

To Reduce Latency

Use caching (Redis, CDN)
Optimize database queries (indexes, denormalization)
Reduce network hops

To Increase Throughput

Horizontal scaling (more servers)
Message queues (Kafka, RabbitMQ)
Batch processing

💡 Key Takeaways

Latency = speed of one request

Throughput = how many requests per unit time

Understanding both metrics is crucial for designing scalable systems that meet user expectations and business requirements.

📌 Latency​

Examples in System Design​

📌 Throughput​

Examples in System Design​

⚖️ Relationship Between Latency and Throughput​

Different Combinations​

Trade-offs​

📊 Real-World Example: Ride-sharing App (Uber, Ola)​

🛠 Optimization Strategies​

To Reduce Latency​

To Increase Throughput​

💡 Key Takeaways​

📌 Latency

Examples in System Design

📌 Throughput

Examples in System Design

⚖️ Relationship Between Latency and Throughput

Different Combinations

Trade-offs

📊 Real-World Example: Ride-sharing App (Uber, Ola)

🛠 Optimization Strategies

To Reduce Latency

To Increase Throughput

💡 Key Takeaways